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Abstract —Efficiently finding similar segments or motifs in time 
series data is a fundamental task that, due to the ubiquity of 
these data, is present in a wide range of domains and situations. 
Because of this, countless solutions have been devised but, to 
date, none of them seems to be fully satisfactory and flexible. In 
this article, we propose an innovative standpoint and present a 
solution coming from it: an anytime multimodal optimization 
algorithm for time series motif discovery based on particle 
swarms. By considering data from a variety of domains, we show 
that this solution is extremely competitive when compared to 
the state-of-the-art, obtaining comparable motifs in considerably 
less time using minimal memory. In addition, we show that it is 
robust to different implementation choices and see that it offers 
an unprecedented degree of flexibility with regard to the task. All 
these qualities make the presented solution stand out as one of 
the most prominent candidates for motif discovery in long time 
series streams. Besides, we believe the proposed standpoint can 
be exploited in further time series analysis and mining tasks, 
widening the scope of research and potentially yielding novel 
effective solutions. 

Index Terms —Particle swarm, multimodal optimization, time 
series streams, motifs, anytime. 

I. Introduction 

T ime series are sequences of real numbers measured 
at successive, usually regular time intervals. Data in the 
form of time series pervade science, business, and society. 
Examples range from economics to medicine, from biology to 
physics, and from social to computer sciences. Repetitions or 
recurrences of similar phenomena are a fundamental charac¬ 
teristic of non-random natural and artificial systems and, as 
a measurement of the activity of such systems, time series 
often include pairs of segments of strikingly high similarity. 
These segment pairs are commonly called motifs ||T], and their 
existence is unlikely to be due to chance alone. In fact, they 
usually carry important information about the underlying sys¬ 
tem. Thus, motif discovery is fundamental for understanding, 
characterizing, modeling, and predicting the system behind 
the time series Q. Besides, motif discovery is a core part of 
several higher-level algorithms dealing with time series, in par¬ 
ticular classification, clustering, summarization, compression, 
and rule-discovery algorithms (see, e.g., references in Q, Q). 

Identifying similar segment pairs or motifs implies examin¬ 
ing all pairwise comparisons between all possible segments 
in a time series. This, specially when dealing with long 
time series streams, results in prohibitive time and space 
complexities. It is for this reason that the majority of motif 
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discovery algorithms resort to some kind of data discretization 
or approximation that allows them to hash and retrieve seg¬ 
ments efficiently. Eollowing the works by Lin et al. |[T] and 
Chiu et al. Q, many of such approaches employ the SAX rep¬ 
resentation Q and/or a sparse collision matrix Q. These allow 
them to achieve a theoretically low computational complexity, 
but sometimes at the expense of very high constant factors. In 
addition, approximate algorithms usually suffer from a number 
of data-dependent parameters that, in most situations, are not 
intuitive to set (e.g., time/amplitude resolutions, dissimilarity 
radius, segment length, minimum segment frequency, etc.). 

A few recent approaches overcome some of these lim¬ 
itations. Eor instance, Castro & Azevedo Q propose an 
amplitude multi-resolution approach to detect frequent seg¬ 
ments, Li & Lin Q use a grammar inference algorithm 
for exploring motifs with lengths above a certain threshold, 
Wilson et al. 0 use concepts from immune memory to 
deal with different lengths, and Eloratou et al. ID combine 
suffix trees with segment models to find motifs of any length. 
Nevertheless, in general, these approaches still suffer from 
other data-dependent parameters whose correct tuning can 
require considerable time. In addition, approximate algorithms 
are restricted to a specific dissimilarity measure between seg¬ 
ments (the one implicit in their discretization step) and do not 
allow easy access to preliminary results, which is commonly 
known as anytime algorithms ID- Einally, to the best of our 
knowledge, only (Tg-fg consider the identification of motif 
pairs containing segments of different lengths. This can be 
considered a relevant feature, as it produces better results in a 
number of different domains O- 

In contrast to approximate approaches, algorithms that do 
not discretize the data have been comparatively much less pop¬ 
ular, with low efficiency generally. Exceptions to this statement 
achieved efficiency by sampling the data stream ID or by 
identifying extreme points that constrained the search 116|. In 
fact, until the work of Mueen et al. ID’ the exact identification 
of time series motifs was thought to be intractable for even 
time series of moderate length. In said work, a clever segment 
ordering was combined with a lower bound based on the 
triangular inequality to yield the true, exact, most similar 
motif. According to the authors, the proposed algorithm was 
more efficient than existing approaches, including all exact 
and many approximate ones ID- After Mueen et al.’s work, 
a number of improvements have been proposed, the majority 
focusing on eliminating the need to set a fixed segment 
length |g-|g. 

Mueen himself has recently published a variable-length 
motif discovery algorithm which clearly outperforms the it- 
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erative search for the optimal length using | [T7| and, from the 
reported numbers, also outperforms further approaches such 
as |[T8|-p0j. This algorithm, called MOEN Q, is essentially 
parameter-free, and is believed to be one of the most efficient 
motif discovery algorithms available nowadays. However, its 
complexity is still quadratic in the length of the time series Q, 
and hence its applicability to large-scale time series streams 
remains problematic. Furthermore, in order to derive the lower 
bounds used, the algorithm is restricted with regard to the 
dissimilarity measure used to compare time series segments 
(Euclidean distance after z-normalization). In general, exact 
motif discovery algorithms have important restrictions with 
regard to the dissimilarity measure, and many of them still 
suffer from being non-intuitive and tedious to tune parameters. 
Moreover, few of them allow for anytime versions and, to the 
best of our knowledge, not one of them is able to identify 
motif pairs containing segments of different lengths. 

In this article, we propose a new standpoint to time se¬ 
ries motif discovery by treating the problem as an anytime 
multimodal optimization task. To the best of our knowl¬ 
edge, this standpoint is completely unseen in the literature. 
Here, we firstly reason and discuss its multiple advantages 
(Sec. 1^. Next, we present SWARMMOTIF (Sec. [nl| , an 
anytime algorithm for time series motif discovery based on 
particle swarm optimization (PSO). We subsequently evaluate 
the performance of the proposed approach using 9 different 
real-world time series from distinct domains (Sec. Our 
results show that SWARMMOTIF is extremely competitive 
when compared to the state-of-the-art, obtaining motif pairs 
of comparable similarity in considerably less time and with 
minimum storage requirements (Sec. 0. Moreover, we show 
that SWARMMOTIF is significantly robust against different 
implementation choices. To conclude, we briefly comment 
on the application of multimodal optimization techniques to 
time series analysis and mining, which we believe has great 
potential (Sec. The data and code used in our experiments 
will available online. 

11. Time Series Motif Discovery as an Anytime 
Multimodal Optimization Task 

A. Definitions and Task Complexity 

From the work by Mueen et al. we can derive a 

formal, generic similarity-based definition Q of time series 
motifs. Given a time series z of length n, z = [zi,... Zn], a 
normalized segment dissimilarity measure D, and a temporal 
window of interest between Wniin and Wmax samples, the top- 
k time series motifs A4 = {mi,.. .m^} correspond to the k 
most similar segment pairs z““ = [za, ■ •. Za+Wa-i] and z™*” = 
[Z 5 , . . . — for tUfl, Wl) € ['ttJmin; H ^ [1; 12 

and 6 S [1, n — Wb -f 1] where, in order to avoid repeated 
and trivial matches Q, O -I- tUa < b. Thus, the i-th motif 
can be fully described by the tuple = {a,Wa,b,Wb}. The 
motifs in AA are non-overlapping and ordered from lowest 
to highest dissimilarity such that i9(mi) < D{in 2 ) < • • • < 
D{mk) where D{mi) = D{{a,Wa,b,Wb}) = 

^Notice that, following j^, this dehnition can be trivially extended to 
different degrees of overlap. 



i [samples] 

Fig. 1. Example of a time series motif pair found in the WILDLIFE time 
series of |21| using SWARMMOTIF and normalized dynamic time warping 
as the dissimilarity measure: a = 248, Wa = 244, b = 720, and Wb = 235. 
Note that Wa f W), and, hence, a warping of the two segments needs to 
take place. This specific solution cannot be found by any of the approaches 
mentioned in Sec. ||] 

An example of a time series motif pair from a real data set is 
shown in Fig. 

It is important to stress that D needs to normalize with 
respect to the lengths of the considered segments. Otherwise, 
we would not be able to compare motifs of different lengths. 
There are many ways to normalize with respect to the length of 
the considered segments. Ratanamahatana & Keogh | [22) list 
a number of intuitive normalization mechanisms for dynamic 
time warping that can easily be applied to other measures. For 
instance, in the case of a dissimilarity measure based on the 
Fp norm p^ , we can directly divide by the segment lengtlj^ 
using brute-force upsampling to the largest length p2| when 
Wa 7^ Wb- 

From the definitions above, we can see that a brute-force 
search in the motif space for the most similar motifs is of 
O(n'^WA^), where wa = w^ax — + 1 (for the final 

time complexity one needs to further multiply by the cost 
of calculating D). Hence, for instance, in a perfectly feasible 
case where n = 10^ and wa — 10^, we have 10^° possibilities. 
Magnitudes like this challenge the memory and speed of any 
optimization algorithm, specially if we have no clue to guide 
the search p4) . However, it is one of our main objectives to 
show here that time series generally provide some continuity 
to this search space, and that this continuity can be exploited 
by optimization algorithms. 

B. Continuity 

A fundamental property of time series is autocorrelation, 
implying that consecutive samples in a time series have some 
degree of resemblance and that, most of the time, we do not 
observe extremal differences between thenp] This property, 
together with the established ways of computing similarity 
between time series p^ , is what gives continuity to our search 
space. Consider a typical dissimilarity measure like dynamic 
time warping between z-normalized segments and the time 
series of Fig. If we fix the motif starting points a and b 
to some random values, we can compute i9(z^,z^) for i,j = 
LUmin, ■ ■ ■, iFmax (Fig. |^). We See that these two dimensions 

^The only exception is with Loo, which could be considered as already 
being normalized. 

^If a time series had no autocorrelation, we might better treat it as an 
independent random process. 
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Fig. 2. Visualizations of the search space obtained with the WILDLIFE time 
series ED and dynamic time warping as the dissimilarity measure: (A) fixing 
a = 110 and b = 602 but i^j = 200,... 250 and (B) fixing Wa = 222 and 
Wf) = 240 but i,j = 1,... 500. Darker colors corresponds to more similarity. 


have a clear continuity, i.e., that D(z^,z^) ~ ~ 

D(z^,z^’'’^) ~ £)(z^+^,z^’^^), and so forth. Similarly, if we fix 
the motif lengths Wa and Wb to some random values, we can 
compute £>(z^“, zj'’) for i = 1,.. .n — Wa and j = 1,.. .n — 
Wb (Fig. Efi). We see that the remaining two dimensions of 
the problem also have some continuity, i.e., D{zf'^,zJ‘’~^^) ^ 
D{z^^,,zJ^) ~ Dizr^^zJl,) ^ D{z^„zJl,), and so forth. 
The result is a four-dimensional, multimodal, continuous but 
noisjj^motif space, where the dissimilarity D acts as the fitness 
measure and the top-fc valley peaks (considering dissimilarity) 
correspond to the top-fc motifs in M.. 



I[s] 

Fig. 3. Schema of a plot to assess the performance of an anytime motif 
discovery algorithm (blue curve). Error bars indicate 5 and 95 percentiles 
of D{mi), their central marker indicates the median, and the isolated dots 
indicate maximum and minimum values. The gray area at the bottom denotes 
the area where D(mi)* lie. The top-left black error bar acts as a reference 
and shows the range of D(ini) obtained by random sampling the motif 
space. The bottom-right red error bar is placed at the time that the baseline 
algorithm spent in the calculations. Better performing anytime algorithms have 
a curve closer to the bottom left corner, quickly entering the gray area as their 
execution time increases. Notice that both axes are logarithmic. 


C. Anytime Solutions 

Finding an optimization algorithm that can locate the global 
minima of the previous search spaces faster than existing motif 
discovery algorithms can be a difficult task. However, we 
have robust and established algorithms for efficiently locating 
prominent local minima in complex search spaces |[^-|[Z7). 
Hence, we can intuitively devise a simple strategy: if we 
keep the best found minima and randomly reinitialize the 
optimization algorithm every time it stagnates, we should, 
sooner or later, start locating the global minima. In the mean¬ 
time, we could have obtained relatively good candidates. This 
corresponds to the basic paradigm of anytime algorithms |TT). 

Anytime algorithms have recently been highlighted as 
“very beneficial for motif discovery in massive [time series] 
datasets” GB- In an anytime algorithm for motif discovery, 
D{ini) improves over time, until it reaches the top-fc dis¬ 
similarity values D(mi)* obtained by a brute-force search 
approach. Thus, we gradually improve M until we reach 
the true exact solution A4*. A good anytime algorithm will 
quickly find low £)(mi), ideally reaching earlier than 

its non-anytime competitors (Fig. [^. 

Note that a sufficiently good A4 may suffice in most situa¬ 
tions, without the need that M = A4*. This is particularly true 
for more exploratory tasks, where one is typically interested 
in data understanding and visual inspection (see Q), and can 
also hold for other tasks, as top-fc motifs can be very similar 
among themselves. In the latter situation, given a seed within 

'^We use the term noisy here to stress that the continuity of the space may 
be altered at some points due to potential noise in the time series. It is not 
the case that we have a noisy, unreliable dissimilarity measurement D that 
could change in successive evaluations. 


Ad*, we can easily and efficiently retrieve further repetitions 
via common established approaches | [28) , p9) . Thus, only 
non-frequent or singular motifs may be missed. These can 
be valuable too, as the fact that they are non-frequent does 
not imply that they cannot carry important information (think 
for example of extreme events of interest that perhaps only 
happen twice in a measurement). For those singular motifs, 
we can wait longer if using an anytime algorithm, or we can 
resort to the state-of-the-art if that is able to provide its output 
within an affordable time limit. 


D. Particle Swann Optimization 

The continuity and anytime observations above (Secs. |II-B| 
and |II-C| l relax the requirements for the optimization algorithm 
to be employed in the considered motif spaces. In fact, if 
we do not have to assess the global optimality of a solution, 
we have a number of approaches that can deal with large, 
multimodal, continuous but noisy search spaces @-11^. 
Among them, we choose PSO p0)-||34). PSO is a population- 
based stochastic approach for solving continuous and discrete 
optimization problems which has been applied to multi¬ 


modal problems |35|. It is a metaheuristic |27|, meaning that 
it cannot guarantee whether the found solution corresponds to 
a global optimum. The original PSO algorithm cannot even 
guarantee the convergence to a local optimum, but adapted 
versions of it have been proven to solve this issue p^ . Other 
versions guarantee the convergence to the global optimum, but 
only with the number of iterations approaching infinity p^ . 

PSO has gained increasing popularity among researchers 
and practitioners as a robust and efficient technique for solving 
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difficult optimization problems. It makes few or no assump¬ 
tions about the problem being optimized, does not require it 
to be differentiable, can search very large spaces of candidate 
solutions, and can be applied to problems that are irregular, 
incomplete, noisy, dynamic, etc. (see p0)-p5j and references 
therein). PSO iteratively tries to improve a candidate solution 
with regard to a given measure of quality or fitness function. 
Hence, furthermore, it can be considered an anytime algorithm. 


E. Advantages of an Optimization-Based Solution Using Par¬ 
ticle Swanns 

Notice that treating time series motif discovery as an 
optimization problem naturally yields several advantages; 

1) We do not require much memory, as we can basically store 
only the stream time series and preprocess the required 
segments at every fitness evaluation. 

2) We are able to achieve a certain efficiency, as optimization 
algorithms do not usually explore the full solution space 
and perform few fitness evaluations p4| . 

3) We can employ any dissimilarity measure D as our fitness 
function. Its only requirements are segment length indepen¬ 
dence and a minimal search space continuity. Intuitively, 
this holds for the high majority of time series dissimilarity 
measures that are currently used (Secs. II-A and |II-B| l. 
Additionally, we can straightforwardly incorporate notions 
of ‘interestingness’, hubness, or complexity (see references 


in |23|). This flexibility is very uncommon in current time 
series motif discovery algorithms (Sec. |I]|. 

4) We do not need to force the two segments of the motif to 
be of the same length. The dissimilarity function D can 
expressly handle segments of different lengths or we can 
simply upsample to the largest length (see ||22|). Although 
considering different segment lengths has been highlighted 
as an objectively better approach, practically none of the 
current time series motif discovery algorithms contemplates 
this option (Sec. |I]l. 

5) Since we search for the optimal Wa and Wb, together with 
a and b, we do not need to set the exact segment lengths 
as a parameter. Instead, we can use a more intuitive and 
easier to set range of lengths Wa,Wb & [tUmin, tfniax]■ 

6) We can easily modify our fitness criterion to work with 
different task settings. Thus, just by replacing D, we are 
able to work with multi-dimensional time series detect 
sub-dimensional motifs p8) , perform a constrained motif 
discovery task fib) , etc. 

7) We can incorporate notions of motif frequency to our 
fitness function and hence expand our similarity-based 
definition of motif to incorporate both notions For 
instance, instead of optimizing for individual motifs irii, 
we can optimize sets of motifs of size such that 

is minimal. We can choose to be a 
minimum frequency of motif appearance or we can even 
decide to optimize it following any suitable criterion. 

In addition, using PSO has a number of interesting properties, 

some of which may be shared with other metaheurisics: 

1) We have a straightforward mapping to the problem at hand 
(Sec. lIlTAl i. 


2) By construction, we have an anytime algorithm (Sec. |II-E)] l. 

3) We can obtain accurate and much faster solutions, as com¬ 
pared to the state-of-the-art in time series motif discovery 
(Sec. [V^ . 

4) We have an essentially parameter-free algorithm p^ . As 
will be shown, all our parameter choices turn out to be 
non-critical to achieve the most competitive performances 
(Secs. [V^ and |V^. 

5) We have an easily parallelizable algorithm. The agent- 
based nature of PSO naturally yields to parallel implemen¬ 
tations | [^ . 

6) We still have the possibility to apply lower bounding tech¬ 
niques to D in order to reduce its computational cost Q, 
p9) . Among others, we may exploit the particles’ best-so- 
far values or spatially close dissimilarities. 

7) All of these use a simple, easy to implement algorithm 
requiring low storage capabilities (Sec. [IlTBl i. 


III. SWARMMOTIF 


A. Main Algorithm 

Our PSO approach to time series motif discovery is based on 
the combination of two well-known extensions to the canonical 
PSO ([^. On one hand, we employ multiple reinitializations 
of the swarm on stagnation p9) . On the other hand, we exploit 
the particles’ “local memories” with the intention of forming 
stable niches across different local minima The former 
emulates a parallel multi-swarm approach p5| without the 
need of having to define the number of swarms and their 
communication. The latter, when combined with the former. 


results in a low-complexity niching strategy 1351 that does 
not require niching parameters (see the related discussion 
in pT) , |42|). SWARMMOTIF, the implementation of the two 
extensions, is detailed in Algorithmic 

SWARMMOTIF takes a time series z of length n as input, 
together with a segment dissimilarity measure D, and the 
range of segment lengths of interest, limited by Wmin and 
Wmax- The user also needs to specify k, the desired number of 
motifs, and tmax, the maximum time spent by the algorithm 
(in iteration^. SWARMMOTIF outputs a set of k non¬ 
overlapping motifs A4 ■ We implement AI as a priority queue, 
which typically stores more than k elements to ensure that it 
contains k non-overlapping motifs. This way, by sorting the 
motif candidates as soon as they are found, we allow potential 
queries to A4 at any time during the algorithm’s execution. In 
that case, we only need to dynamically check the candidates’ 
overlap (Sec. III-B[ ). Notice that n, D, tUmin, Wmax, k, and 
fmax are not parameters of the algorithm, but requirements 
of the task (they depend on the data, the problem, and the 
available time). The only parameters to be set, as specified 
in Algorithm [Cs requirements, are the number of particles k, 
the topology 9, the constriction constant and the maximum 
amount of iterations at stagnation r. Nevertheless, we will 
show that practically none of the possible parameter choices 


^The number of iterations is easy to infer from the available time as, for 
the same input, the elapsed time will be roughly directly proportional to the 
number of iterations. 
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Algorithm 1 SWARMMOTIF(z,£),Wniin,l(;jnax.fcj^max) 


Algorithm 2 lNITIALIZESWARM(n,tymin,Wmax,K) 


Input; Time series z of length n, dissimilarity measure D, 
minimum and maximum segment length u>min and tUmax, 
number of motifs k, and maximum amount of time (num¬ 
ber of iterations) 

Require: Number of particles k, topology 6, constriction 
constant (p, and maximum amount of time at stagnation 
(number of iterations) r. 

Output: A set of motifs M. 


1 

2 

3 

4 

5 

6 

7 

8 
9 

10 

11 

12 

13 

14 

15 

16 

17 

18 

19 

20 
21 
22 

23 

24 

25 

26 
27 


Co,Ci,C2 ^ GETC0NSTANTS((/)) 

X,V,S,'P 3- lNITIALIZESWARM(n,r(;inin,Wmax,«;) 

0 ^ InitializeTopology(0,k) 

S* ^ 00 

M ^ EmptyPriorityQueueO 

for t = 1, . . . fniax 
for i = 1,.. .K 

if VALIDPOSITION(Xi) 
d ■<— D(xi) 
it d < Si 
Si 3— d 
P* ^ X, 

Al.PusH((i,Xi) 
if d < s* 
s* ^ d 

^update f 

for i = 1,. . .K 
9 ^ i 
for j in 0i 

if Sj < Sg 

9 3 

Vi ^ Cq\i + CiUi (g) (Pi - Xi) -f C2U2 <g) (Pg - Xi) 
Xi ^ Xi + Vi 
if f ^update — 7" 

X,V,S,'P ^ lNITIALIZESWARM(n,r(;inm,tPmax,K) 

S* ^ 00 

return NONOvERLAPPiNG(Al,fc) 


introduces a significant variation in the reported performance 
(Sec. |V^ . 

Having clarified SWARMMOTIF’s input, output, and re¬ 
quirements, we now elaborate on its procedures. Algorithm [T] 
starts by computing the velocity update constants (line 1) 
following Clerc’s constriction method |[43), i.e.. 


Co 


2 


and 

Cl = C2 = Co(j)/2. (1) 


Next, a swarm with k particles is initialized (line 2). The 
swarm is formed by four data structures: a set of particle 
positions X = {xi,...Xk}, a set of particle velocities V = 
{vi,...Vk}, a set of particle best scores S = {si,...Sk}, 
and a set of particle best positions V = {Pi,-- Pk} (the 
initialization of these four data structures is detailed in Algo¬ 
rithm]^. Particles’ positions x^ and p^ completely determine 


Input: Time series length n, minimum and maximum seg¬ 
ment length uitnin and tUmax, and number of particles k. 
Output: Particle positions X, velocities V, best scores S, and 
best positions V. 

1 : for i = 1,... K 

2: Xi^2 ^ Wmin -f (w max 1 

3: Xi^4 ^ Wmin + (w max 1 '^min)w 

4: Xi^i^ 1 + {n-Xi^2){t- V^) 

5: ^ 1 + {n - Xi^4 - {Xi^i + Xi^ 2 ))u 

6: x' •<— As in lines 2—5 

7: Vi 3- x' - Xi 

8 : Si <— 00 

9: Pi ^ Xi 

10: return X,V,S,V 


a motif candidate, and have a direct correspondence with nii 
(see Sec. [IlTBl i. A further data structure 0 indicates the indices 
of the neighbors of each particle according to a given social 
topology 9 (line 3). Apart from the swarm, we also initialize a 
global best score s* (line 4) and the priority queue M (line 5). 
We then enter the main loop (lines 6-26). In it, we perform 
three main actions. Firstly, we compute the particles’ fitness 
and perform the necessary updates (lines 7-16). Secondly, we 
modify the particles’ position and velocity using their personal 
and neighborhood best positions (lines 17-23). Thirdly, we 
control for stagnation and reinitialize the swarm if needed 
(lines 24-26). Finally, when we exit the loop, we return the 
first k non-overlapping motif candidates from A4 (line 27). 

The particles’ fitness loop (lines 7-16) can be described 
as follows. For the particles that have a valid position within 
the ranges used for particle initializations (line 8; see also 
Algorithm for initializations), we calculate their fitness D 
(line 9) and, if needed, update their personal bests Si and p^ 
(lines 10-12). As mentioned, D needs to be independent of 
the segments’ lengths, which is typically an easy condition 
for time series dissimilarity measures (Sec. II-A| i. In the case 
that the particles find a new personal best, we save the motif 
dissimilarity d and its position x^ into Xi (line 13). Next, we 
update fupdate^ the last iteration when an improvement of the 
global best score s* has occurred (lines 14-16). 

The particles’ update loop (lines 17-23) is straightforward. 
We first select each particle’s best neighbor g using the 
neighborhood personal best scores Sj (lines 18-21). Then, 
we use the positions of the best neighbor’s personal best 
Pg and the particle’s personal best Pj to compute its new 
velocity and position (lines 22-23). We employ component¬ 
wise multiplication, denoted by ®, and two random vectors Ui 
and U 2 whose individual components Uij — (7(0,1), being 
U{l,h) a uniform real random number generator such that 
I < U{l,h) < h. Note that by considering the particles’ neigh¬ 
borhood personal bests Pg we follow the aforementioned local 
neighborhood niching strategy pO) . At the end of the loop we 
control for stagnation by counting the number of iterations 
since the last global best update and applying a threshold t 
(line 24). Note that this is the mechanism responsible for the 
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aforementioned multiple reinitialization strategy | [39]. 

The initialization of the swarm used in AlgorithmTn (lines 2 
and 25) is further detailed in Algorithm In it, for each 
particle, two random positions and x' are drawn (lines 2 - 6 ) 
and the initial velocity is computed as the subtraction of the 
two (line 7). To obtain x^ and x', uniform real random numbers 
u = 17(0,1) are subsequently generated. The personal best 
score Si is set to infinite (line 8 ) and x^ is taken as the current 
best position (line 9). Note that ^/u (line 4) is used to ensure 
a uniform distribution of the particles across the triangular 
subspace formed by cci i and Xj 3 (line 5; see also Sec. II-Ai. 


B. Implementation Details 

Some implementation details are missing in Algorithms [T] 
and 1^ Firstly, positions x^ are floored component-wise in¬ 
side ValidPosition, D, and M. (thus obtaining motif irii). 
Secondly, the motif priority queue is implemented as an 
associative container (logarithmic insertion time) that sorts its 
elements according to d and stores mi. Thirdly, the last visited 
positions are cached into a hash table (constant lookup time) 
in order to avoid some of the possible repeated dissimilarity 
computations. Fourthly, we incorporate the option to constrain 
the motif search by specifying a maximum segment stretch in 
Algorithm and ValidPosition. Finally, the function that 
returns the non-overlapping top-fc motifs employs a boolean 
array of size n in order to avoid 0{k^) comparisons between 
members of the queue (cf. @). Notice that we have a memory- 
efficient implementation, as we basically only need to store z 
and the boolean array (both of 0{n) space), M. (of 0{k) 
space, k n), and X, V, S, V, and 0 (all of them of 0{k) 
space, K <C n). The aforementioned hash table (optional) can 
be allocated in any predefined, available memory segment. For 
the sake of brevity, the interested reader is referred to the pro¬ 
vided code for a full account of the outlined implementation 
details. 


C. Variants 

Given the main Algorithm [T] we consider a number of 
variations that may potentially improve SWARMMOTIF’s 
performance without introducing too much algorithmic com¬ 
plexity: 

• Sociability: We study whether a “cognitive-only” model, a 
“social-only” one, or different weightings of the two yield to 
some improvements g4). To do so, we just need to introduce 
a parameter a G [ 0 , 1 ] controlling the degree of ‘sociability’ 
of the particles, and implement lines 1-2 of Algorithm 
instead of Eq. [T] 

• Stochastic: We investigate the use of a random inertia 
weight l l^ . This may alleviate the need of using the same 
Co in different environments, providing a potentially more 
adaptive trade-off between exploration and exploitation (also 
controlled by a in the previous point). To consider this 
variant, we just need to replace line 22 in Algorithm [T] by 
line 3 in Algorithm 

• Velocity clamping: In addition to constriction, we study lim¬ 
iting the maximum velocity of the particles Eg. Empirical 
studies have shown that the simultaneous consideration of a 


Algorithm 3 Variations to Algorithmic sociability (lines 1-2), 
stochastic (line 3), velocity clamping (lines 4-6), and craziness 
(lines 7-10). 


1 

Cl ^ Co^f)(l - a) 


2 

C 2 i — CQcjxy. 


3 

G- (l-2(l-Co))uVi-|-CiUi(g)(Pj- 

Xi)+C2U2<^{Pg-X,) 

4 

v™ge •(— [n, ■u;a7 71, u>a]/2 


5 

for in 


6 

, • / range / range 

Vij G- min(u^- ® ,max(-ti^ ® ,v, 

. 7 )) 

7 

v' ^ Ai in Algorithm |^ 


8 

for Vij in Vi 


9 

if u < p 


10 




constriction factor and velocity clamping results in improved 
performance on certain problems | [46) . To apply velocity 
clamping we add lines 4—6 of Algorithmic between lines 22 
and 23 of Algorithm |C 

• Craziness: We introduce so-called “craziness” or “perturba¬ 
tion” in the particles’ velocities, as initially suggested by 
Kennedy & Eberhart | |45) . In such variant, inspired by the 
sudden direction changes observed in flocking birds, the 
particles’ velocity is altered with a certain probability p, with 
the aim of favoring exploration by increasing directional 
diversity and discouraging premature convergence pT) . We 
coincide with Eg in that, in some sense, this can be seen 
as a mutation operation. To implement craziness we add 
lines 7-10 of Algorithm |C between lines 22 and 23 of 
Algorithm |C 

IV. Evaluation Methodology 

To evaluate SWARMMOTIE’s speed and accuracy we con¬ 
sider plots like the one presented in Eig. |^ As a reference, we 
draw uniform random samples from the motif search space 
and compute their dissimilarities (we take as many samples as 
the length n of the time series). As a baseline, we use the top- 
25 motifs found by MOEN ||g, which we will denote by Af*. 
Existing empirical evidence 0 , eg suggests that MOEN is 
the most efficient algorithm to retrieve the top exact similarity- 
based motifs in a range of lengthE](Sec.|^. Notice furthermore 
that here we are not that interested in obtaining all top-25 
true exact motifs, but more concerned on obtaining good seed 
motifs within these using an anytime approach (Sec. |II-C| l. 

As its competitors, MOEN has however some limitations 
(Sec. 1^. Thus, to fairly compare results, we have to apply 
some constrains to our algorithm. Since MOEN can only 
use the Euclidean distance between z-normalized segments, 
here we also adopt this formulation for D. In addition, as 
MOEN only considers pairs of segments of the same length 
(without resampling), we have to constrain SWARMMOTIE so 

^Besides, we could not find any other promising exact or anytime approach 
with some available code, nor with sufficient detail to allow a reliable 
implementation. 
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TABLE I 

Characteristics of the time series used for evaluation, 

TOGETHER WITH THE CONSIDERED SEGMENT LENGTH RANGES. 


Name 

Duration 

Sample Rate 

n 


Wm-dx 

DowJones 

129 y 

Id-^ 

35503 

30 

300 

CarCount 

175 d 

1/5 min -1 

47534 

270 

320 

Insect 

32 min 

38 Hz 

73929 

300 

500 

EEG 

Ih 

50 Hz 

180214 

150 

250 

Fieldrecording 

3h 

21.5 Hz 

226383 

50 

150 

Wind 

4y 

1/6 min~^ 

369138 

200 

280 

Power 

32 months 

1/5 min~^ 

410608 

250 

620 

EOG 

9h 

25 Hz 

809948 

150 

200 

RandomWalk 

- 

- 

1000000 

100 

250 


that Xi ^2 = Xi^ 4 . Therefore, the reported motif dissimilarities 
D(mi) correspond to the Euclidean distance between two z- 
normalized segments of the same length (we divide by the 
length of the segments to compare different segment lengths, 
Sec. ll-A| l. In the reported experiments, SWARMMOTIF is 
run 10 times with k = 10. We stop its execution when we 
find 95% of I?(nii) within Ai*. This way, we assess the time 
taken to retrieve any 10 motifs from those with at least 95% 
confidence. All experiments are performed using a single core 
of an Intel® Xeon® CPU E5-2620 at 2.00 GHz. 

To demonstrate that SWARMMOTIF is not biased towards 
a particular data source, time series length, or motif length, 
we consider 9 different time series of varying length, coming 
from distinct domains, and a number of arbitrary but source- 
consistent motif lengths (Table |I|. As mentioned, we make 
these time series and our code available online (Sec. [I]). Four of 
the time series have been used for motif discovery in previous 
studies 113 , ij4g, while the other five are employed here for 
the first time for this task: 


1) DowJones: The daily closing values of the Dow Jones 
average in the USA from May 2, 1885 to April 22, 
2014 @. 

2) CarCOUNT: The number of cars measured for the Glen¬ 
dale on ramp for the 101 North freeway in Eos Angeles, 
CA, USA 150) . The measurement was carried out by the 
Freeway Performance Measurement SysteirQ and the data 
was retrieved from the UCI Machine Teaming Repos¬ 
itory 0. Segments of missing values were manually 
interpolated or removed. 

3) Insect: The electrical penetration graph of a beet leafhop- 
per {circulifer tenellus) .El- The time series was retrieved 
from Mueen’s websit^^ 

4) EEG: A one hour electroencephalogram (in /rV) from a 
single channel in a sleeping patient The time series 
was retrieved from Mueen’s websit^and, according to the 
authors, was smoothed and filtered using domain-standard 
procedures. 

5) FieldRecordinG: The spectral centroid (in Hz) of a 
field recording retrieved from Freesouncf^ . We used 
the mean of the stereo channels and the spectral centroid 


(linear frequency) Vamp SDK example plugin from Sonic 
Visualizer |53|. We used a Hann window of 8192 samples 
at 44.1 KHz with 75% overlap. 

6) Wind: The wind speed (in m/s) registered in the buoy of 
Rincon del San Jose, TX, USA, between January 1, 2010 
and April 11, 2014. The time series was retrieved from 
the Texas Coastal Ocean Observation Network websit^H] 
Segments of missing values were manually interpolated or 
removed. 

7) Power: The electricpower consumption (in KW) of an 
individual householcPi The data was retrieved from the 
UCI Machine Feaming Repository | [5T[ . We took the global 
active power, removed missing values, and downsampled 
the original time series by a factor of 5 using averaging 
and 50% overlap. 

8) FOG: An electrooculogram tracking the eye movements 
of a sleeping patient S- We took the downsampled time 
series from Mueen’s web pagj3 

9) Random Walk: A random walk time series. This was 

artificially synthesized using = Zi + N{0, 1) for 

i = 2,...n and Zi — 0, where A^(0,1) is a real Gaus¬ 
sian random number generator with zero mean and unit 
variance. 


To assess the statistical significance of the differences 
between alternative parameter settings, we employ a two stage 
approach. First, we consider all settings at the same time and 
perform the Friedman’s test | [55) ; which is a non-parametric 
statistical test used to detect differences in treatments across 
multiple test attempts. We use as inputs the median values for 
all settings for 25 equally-spaced time steps. In the case some 
difference between settings is detected (i.e., we reject the null 
hypothesis that the settings’ performances come from the same 
distribution), we proceed to the second stage. In it, we perform 
all possible pairwise comparisons between settings using the 
Wilcoxon signed-rank test | [55) ^ another non-parametric statis¬ 
tical hypothesis test used for comparing matched samples. To 
counteract the problem of multiple comparisons and control 
the so-called family-wise error rate, we employ the Holm- 
Bonferroni correction | |5^ . In all statistical tests, we consider 
a significance level of 0.01. 


V. Results 

A. Configuration 

In pre-analysis, and according to common practice, we set 
K — 100, (j) = 4.1, and t = 2000. We then experimented 
with 6 different static topologies 9 | |57) : global best, local best 
(two neighbors). Von Neumann, random (three neighbors), 
wheel, and binary tree. The results showed the qualitative 
equivalence of all topologies except, perhaps, global best 
and wheel (Fig. 13 . In some data sets, these two turned 
out to yield slightly worse performances for short-time runs 
of the algorithm (small t), although for longer runs they 
gradually became equivalent to the rest. However, in general. 


^http://pems.dot.ca.gov 
^ http://www.cs.ucr.edu/~mueen/MK 
^ http://www.es.ucr.edu/^mueen/OnlineMotif 
http://www.freesound.org/people/JeffWojo/sounds/121250 


^ ^ http://lighthouse.tamucc.edu/pq 

http://archive.ics.uci.edu/ml/datasets/Individual+household+electric+ 
power+consumption 

'‘http://www.cs.ucr.edu/~mueen/DAME 












0.1304 



Fig. 4. Effect of 0 on the EEG time series. Equivalent results were observed 
with the other time series. 

no systematic statistically significant difference was detected 
between topologies. With this in mind, we chose the local best 
topology to further favor exploration and parallelism, and to be 
more consistent with our local neighborhood design principle 
of Sec. Ilira 

Next, we studied the effect of the number of particles 
K and the stagnation threshold t. To do so, we kept the 
previous configuration with the local best topology and sub¬ 
sequently evaluated k = {20,40,80,160,320} and r = 
{500,1000,2000,4000,8000}. Essentially, we observed al¬ 
most no performance changes under these alternative settings 
(Figs. 0 and 1^. We only found a statistically significant 
difference in the case of the CarCOUNT data set. Specifically, 
the performance with r = 500 was found to be statistically 
significantly worse than r > 2000. Regarding k, and after 
considering different n, wa and k, a partial tendency seemed 
to emerge; an increasing number of particles k was slightly 
beneficial for increasing lengths n, increasing wa, and increas¬ 
ing k. Unfortunately, we could not obtain strong empirical 
evidence nor formal proof for this statement. Nonetheless, in 
subsequent experiments, we decided to use a value for k and 
T that dynamically adapts SWARMMOTIF’s configuration to 
such predefined task parameters. We arbitrarily set r propor¬ 
tional to K, and K proportional to n and in direct relation 
to WA and k (we refer to the provided code for the exact 
formulation). 

To conclude our pre-analysis, we studied the influence of 
the constriction constant (j). Following common practice, we 
considered (j) = {4.02,4.05,4.1,4.2,4.4,4.8}. In this case, we 
saw that high values had a negative impact on performance 
(Fig. 0- In particular, values of ^ > 4.2 or ^ > 4.4, depending 
on the data set, statistically significantly increased the motif 
dissimilarities at a given t. Contrastingly, values 4 < </) < 4.2 
yielded stable dissimilarities with no statistically significant 
variation (in some data sets, this range could be extended 
to 4 < (/) < 4.4). It is well-known that higher (j) values 
favor exploitation rather than exploration ||43). Hence, it is not 



Fig. 5. Effect of re on the WIND time series. Equivalent results were observed 
with the other time series. 



Us] 


Fig. 6. Effect of r on the FIELDRECORDING time series. Equivalent results 
were observed with the other time series. 


Strange to observe that low (j) values are more appropriate for 
searching the large motif spaces we consider here. We finally 
chose (p = 4.05. 

Overall, the result of our pre-analysis suggests a high degree 
of robustness with respect to the possible configurations. The 
topology 6, the number of particles k, the stagnation threshold 
T, and the constriction constant p have, in general, no signif¬ 
icant influence on the obtained results. The only consistent 
exception is observed for values of </> > 4.4, which are not 
the most common practice p4) . The global best and wheel 
topologies could also constitute a further exception. However, 
as we have shown, these become qualitatively equivalent to 
the rest as execution time t increases, yielding no statistically 
significant difference. We believe that the reported stability 
of SWARMMOTIF against the tested configurations and data 
sets justifies the use of our setting for finding motifs in diverse 
time series coming from further application domains. 
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Fig. 7. Effect of (p on the INSECT time series. Equivalent results were Fig. 8. Effect of p on the EEG time series. Equivalent results were observed 
observed with the other time series. with the other time series. 


B. Variants 


Using the configuration resulting from the previous section, 
we subsequently assessed the performance of the variations 
considered in Sec. 111-C We started with the sociability 
variant, experimenting with social-only models, a = 1, 
cognitive-only models, a — 0, and intermediate configura¬ 
tions, a = {0.2,0.33,0.66,0.8}. Apart from the fact that no 
clear tendency could be observed, none of the previous settings 
was able to consistently reach the performance achieved by the 
original variant (a = 1/2, Eq. in all time series. That is, 
none of the previous settings could statistically significantly 
outperform a = 1/2 in the majority of the data sets. 

Next, we experimented with the stochastic and the velocity 
clamping variants. While the former did not improve our 
results, the latter led to a statistically significant improvement 
for some time series. Because of that, we decided to discard the 
use of a stochastic variant but to incorporate velocity clamping 
to our main algorithm. The former could be difficult to justify 
while the latter has empirical evidence behind it (Sec. |III-C|l. 


Finally, we experimented with craziness and its probability 
p. The results showed a similar performance for 0 < p < 
0.001, a slightly better performance for 0.001 < p < 0.01, and 
an increasingly worse performance for p > 0.01 (Fig. §. A 
statistically significant difference was found between p < 0.01 
and p > 0.1, being p > 0.1 a consistently worse setting. 
These results were expected, as the swarm performs a more 
random search with increasing p, being completely random in 
the limiting case of p = 1. The slightly better performance for 
0.001 < p < 0.01 was not found to be statistically significant 
under our criteria. However, it was visually noticeable for 
some data sets. For instance, with the EEG data set, we see 
that curves 33 and 34 hit the dissimilarities of the true exact 
motif set M.* (gray area) two or three times earlier than curves 
30, 31, and 32 (Fig. [g. With these results, and seeing that p 
values between 0.001 and 0.01 never harmed the performance 
of the algorithm, we chose p = 0.002. 


C. Final Performance 

After extending SWARMMOTIF with velocity clamping 
and craziness, we assess its performance on all considered 
time series using the default parameter combination resulting 
from the previous two sections. As can be seen, the obtained 
motif dissimilarities are far from the random sampling in all 
cases (Fig. notice the logarithmic axes). In addition, we see 
that SWARMMOTIF is able to already obtain dissimilarities 
within A4* as soon as its execution begins. Specifically, motif 
dissimilarities in M. start to overlap the ones in at f < 10 s 
for practically all time series. The only exceptions are FOG 
and Random Walk, where M starts to overlap with Ad* at 
t < 100 s. We hypothesize that taking a smaller number of 
particles k could make A1 overlap with Al* earlier, but leave 
the formal assessment of this hypothesis for future work. 

Finally, as execution time t progresses, we see that 
SWARMMOTIF consistently retrieves lower dissimilarities, up 
to the point that M ~ Ad* (Fig. and Table |n|i. Following 
the condition we specify in Sec^IV| this means that the 
distances in the motif set obtained by SWARMMOTIF are not 
statistically worse than the ones of the true exact motif set. 
With respect to MOEN’s execution time, this happens 1487 
(DowJones), 184 (CarCount), 50 (Insect), 179 (EEG), 
100 (Fieldrecording), 241 (Wind), 286 (Power), 74 
(FOG), and 287 (RandomWalk) times faster. This implies 
between one and three orders of magnitude speedups (more 
than that for DowJONES). Overall, we believe this is an 
extremely competitive performance for an anytime motif dis¬ 
covery algorithm. 


VI. Conclusion 

In this article, we propose an innovative standpoint to the 
task of time series motif discovery by formulating it as an 
anytime multimodal optimization problem. After a concise 
but comprehensive literature review, we reason out the new 
formulation and the development of an approach based on 
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Fig. 9. SWARMMOTIF performance on the considered time series: (A) DowJONES, (B) CarCount, (C) INSECT, (D) EEG, (E), FIELDRECORDING, (F) 
Wind, (G) Power, (H) EOG, and (I) RandomWalk. 
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evolutionary computation. We then highlight the several ad¬ 
vantages of this new formulation, many of which relate to a 
high degree of flexibility of the solutions that come from it. 
To the best of our knowledge, such a degree of flexibility is 
unseen in previous works on time series motif discovery. 

We next present SWARMMOTIF, an anytime multimodal 
optimization algorithm for time series motif discovery based 
on particle swarms. We show that SWARMMOTIF is ex¬ 
tremely competitive when compared to the best approach we 
could find in the literature. It obtains motifs of comparable 
similarities, in considerably less time, and with minimum 
memory requirements. This is confirmed with 9 independent 
real-world time series of increasing length coming from a 
variety of domains. Besides, we find that the high majority 
of the possible implementation choices lead to non-significant 
performance changes in all considered time series. Thus, given 
this robustness, we can think about the proposed solution 
as being parameter-free from the user’s perspective. Over¬ 
all, if we add the aforementioned, unprecedented degree of 


flexibility, SWARMMOTIF stands out as one of the most 
prominent choices for motif discovery in long time series 
streams. Since the used data and code are available online 
(Sec. |I]), the research presented here is fully reproducible, 
and SWARMMOTIF is freely available to researchers and 
practitioners. 

We believe that the consideration of multimodal optimiza¬ 
tion algorithms is a relevant direction for future research in 
the field of time series analysis and mining. Not only with 
regard to motif discovery, but also in other tasks such as 
querying for segments of unknown length p8) or determining 
optimal alignments and similarities p3) . With regard to the 
latter, we envision powerful approaches to variable-length 
local similarity calculations, in the vein of existing dynamic 
programming approaches | |58t , | |59) . Finally, we believe that 
considering the search spaces and the time constraints de¬ 
rived from time series problems can be a challenge for the 
evolutionary computation community. We look forward to 
exploring all these topics in forthcoming works, together with 
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TABLE II 

Comparison between MOEN and SWARMMOTIF results. The latter are taken at the time when the defined stopping criterion is 

MET (Sec. [Tv}. Results eor other time steps are available online (Sec.|^. 


Approach 

Result 

DowJones 

Dodgers 

Insect 

EEG 

Fieldrecording 

Wind 

Power 

EOG 

RandomWalk 

MOEN 

t [s] 

2933.9 

838.0 

9051.1 

21596.0 

76415.0 

70727.2 

672161.9 

430790.3 

430676.6 


D{mi)* median 

0.0147 

0.0260 

0.0140 

0.0119 

0.0301 

0.0117 

0.0199 

0.0206 

0.0091 


D(mi)* min 

0.0084 

0.0225 

0.0102 

0.0081 

0.0240 

0.0102 

0.0102 

0.0137 

0.0080 


D(m^)* max 

0.0191 

0.0279 

0.0168 

0.0131 

0.0347 

0.0124 

0.0232 

0.0218 

0.0094 

SWARMMOTIF 

t [s] 

2.0 

4.6 

180.0 

120.6 

764.8 

293.1 

2346.5 

5807.7 

1497.3 


7)(mi) median 

0.0132 

0.0259 

0.0135 

0.0119 

0.0315 

0.0117 

0.0214 

0.0195 

0.0090 


7)(mi) S'*’ percentile 

0.0087 

0.0234 

0.0121 

0.0090 

0.0239 

0.0102 

0.0142 

0.0163 

0.0079 


D(m.i) 95'*’ percentile 

0.0182 

0.0278 

0.0162 

0.0130 

0.0341 

0.0122 

0.0230 

0.0212 

0.0092 


other multimodal optimization techniques that could be easily 
mapped to the problem of time series motif discovery. 
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